HTM: A Topic Model for Hypertexts
نویسندگان
چکیده
Previously topic models such as PLSI (Probabilistic Latent Semantic Indexing) and LDA (Latent Dirichlet Allocation) were developed for modeling the contents of plain texts. Recently, topic models for processing hypertexts such as web pages were also proposed. The proposed hypertext models are generative models giving rise to both words and hyperlinks. This paper points out that to better represent the contents of hypertexts it is more essential to assume that the hyperlinks are fixed and to define the topic model as that of generating words only. The paper then proposes a new topic model for hypertext processing, referred to as Hypertext Topic Model (HTM). HTM defines the distribution of words in a document (i.e., the content of the document) as a mixture over latent topics in the document itself and latent topics in the documents which the document cites. The topics are further characterized as distributions of words, as in the conventional topic models. This paper further proposes a method for learning the HTM model. Experimental results show that HTM outperforms the baselines on topic discovery and document classification in three datasets.
منابع مشابه
Investigation of the Effect of Band Offset and Mobility of Organic/Inorganic HTM Layers on the Performance of Perovskite Solar Cells
Abstract: Perovskite solar cells have become an attractive subject in the solar energydevice area. During ten years of development, the energy conversion efficiency has beenimproved from 2.2% to more than 22%, and it still has a very good potential for furtherenhancement. In this paper, a numerical model of the perovskite solar cell with thestructure of glass/ FTO/ TiO2/...
متن کاملHawkesTopic: A Joint Model for Network Inference and Topic Modeling from Text-Based Cascades
Understanding the diffusion of information in social networks and social media requires modeling the text diffusion process. In this work, we develop the HawkesTopic model (HTM) for analyzing text-based cascades, such as “retweeting a post” or “publishing a follow-up blog post.” HTM combines Hawkes processes and topic modeling to simultaneously reason about the information diffusion pathways an...
متن کاملSpecification and Design of Workflow-Driven Hypertexts
In presents, web combines several applications and it’s seemed to be in all places. So, web applications are changing to meet new requirements such as management of multiple users and complex dataflow. Brambilla, Ceri et al. in there article “Specification and Design of workflow-driven hypertexts” (2002), introduce workflow driven hypertexts. Which are web-enabled hypertextual applications that...
متن کاملComponents of a Model of Context-Sensitive Hypertexts
On the background of rising Intranet applications the automatic generation of adaptable, context-sensitive hypertexts becomes more and more important [El-Beltagy et al., 2001]. This observation contradicts the literature on hypertext authoring, where Information Retrieval techniques prevail, which disregard any linguistic and context-theoretical underpinning. As a consequence, resulting hyperte...
متن کاملThe biologically inspired Hierarchical Temporal Memory
It is herein proposed a handwritten digit recognition system which biologically inspired of the large-scale structure of the mammalian neocortex. Hierarchical Temporal Memory (HTM) is a memory-prediction network model that takes advantage of the Bayesian belief propagation and revision techniques. In this article a study has been conducted to train a HTM network to recognize handwritten digits ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008